Translation initiation site prediction on a genomic scale: beauty in simplicity

نویسندگان

  • Yvan Saeys
  • Thomas Abeel
  • Sven Degroeve
  • Yves Van de Peer
چکیده

MOTIVATION The correct identification of translation initiation sites (TIS) remains a challenging problem for computational methods that automatically try to solve this problem. Furthermore, the lion's share of these computational techniques focuses on the identification of TIS in transcript data. However, in the gene prediction context the identification of TIS occurs on the genomic level, which makes things even harder because at the genome level many more pseudo-TIS occur, resulting in models that achieve a higher number of false positive predictions. RESULTS In this article, we evaluate the performance of several 'simple' TIS recognition methods at the genomic level, and compare them to state-of-the-art models for TIS prediction in transcript data. We conclude that the simple methods largely outperform the complex ones at the genomic scale, and we propose a new model for TIS recognition at the genome level that combines the strengths of these simple models. The new model obtains a false positive rate of 0.125 at a sensitivity of 0.80 on a well annotated human chromosome (chromosome 21). Detailed analyses show that the model is useful, both on its own and in a simple gene prediction setting. AVAILABILITY Datafiles and a web interface for the StartScan program are available at http://bioinformatics.psb.ugent.be/supplementary_data/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Techniques for Recognition of Translation Initiation Sites

Correct prediction of the translation initiation site is an important issue in genomic research. In this chapter, an in-depth survey of half a dozen methods for computational recognization of translation initiation sites from mRNA, cDNA, and genomic DNA sequences are given. These methods span two decades of research on this topic, from the perceptron of Stormo et al. in 1982 to the systematic m...

متن کامل

Using feature generation and feature selection for accurate prediction of translation initiation sites.

Correct prediction of the translation initiation site (TIS) is an important issue in genomic research. We show that feature generation together with correlation based feature selection can be used with a variety of machine learning algorithms to give highly accurate translation initiation site prediction. Only very few features are needed and the results achieve comparable accuracy to the best ...

متن کامل

Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences

The prediction of the Translation Initiation Site (TIS) in a genomic sequence is an important issue in biological research. Although several methods have been proposed to deal with this problem, there is a great potential for the improvement of the accuracy of these methods. Due to various reasons, including noise in the data as well as biological reasons, TIS prediction is still an open proble...

متن کامل

Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction.

Accurate prediction of the position of translation initiation (N-terminal prediction) is a difficult problem. N-terminal prediction from DNA sequence alone is ambiguous is several candidate start sites are close to each other. Protein similarity search is usually unable to indicate the true start of a gene as it would require a strong protein sequence similarity at the N-terminal portion of a p...

متن کامل

Using amino acid patterns to accurately predict translation initiation sites

The translation initiation site (TIS) prediction problem is about how to correctly identify TIS in mRNA, cDNA, or other types of genomic sequences. High prediction accuracy can be helpful in a better understanding of protein coding from nucleotide sequences. This is an important step in genomic analysis to determine protein coding from nucleotide sequences. In this paper, we present an in silic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 23 13  شماره 

صفحات  -

تاریخ انتشار 2007